What Happens To BERT Embeddings During Fine-tuning?

Codebase: https://github.com/r05323028/What_happens_to_bert_embeddings_during_fintuning

In this notebook, we'll try to reproduce What Happens To BERT Embeddings During Fine-tuning? which was accepted by EMNLP2020, Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Todo list

Load Data & Model

In this section, we load wiki dataset, bert-base-uncased which is a pretrained model announced by Google, bert-mnli which was finetuned by glue/mnli dataset and bert-squad which is finetuned by squad.

Representational Similarity Analysis (RSA)

We used pretrained model & finetuned models to get hidden states of layers and compare their cosine similarity.

Due to the above figure, we can conclude that

Structural Probe

Structural probe is a method to evaluate whether a word representation model learns syntax structure in paragraphs. The main idea is, it supposes that syntax tree structures can be remained after linear projecting. The following figure shows this concept intuitively.

header.png

So, how do we evaluate it? the answer is, we can train a probe model to predict the number of edges between every pair tokens

distances.png

The following figure shows that we can use a probe model to project word representations to the subspace which persists the syntax tree structure of the sentence.

space-dist.png

References